11 research outputs found

    The Lazy Bootstrap. A Fast Resampling Method for Evaluating Latent Class Model Fit

    Get PDF
    The latent class model is a powerful unsupervised clustering algorithm for categorical data. Many statistics exist to test the fit of the latent class model. However, traditional methods to evaluate those fit statistics are not always useful. Asymptotic distributions are not always known, and empirical reference distributions can be very time consuming to obtain. In this paper we propose a fast resampling scheme with which any type of model fit can be assessed. We illustrate it here on the latent class model, but the methodology can be applied in any situation. The principle behind the lazy bootstrap method is to specify a statistic which captures the characteristics of the data that a model should capture correctly. If those characteristics in the observed data and in model-generated data are very different we can assume that the model could not have produced the observed data. With this method we achieve the flexibility of tests from the Bayesian framework, while only needing maximum likelihood estimates. We provide a step-wise algorithm with which the fit of a model can be assessed based on the characteristics we as researcher find important. In a Monte Carlo study we show that the method has very low type I errors, for all illustrated statistics. Power to reject a model depended largely on the type of statistic that was used and on sample size. We applied the method to an empirical data set on clinical subgroups with risk of Myocardial infarction and compared the results directly to the parametric bootstrap. The results of our method were highly similar to those obtained by the parametric bootstrap, while the required computations differed three orders of magnitude in favour of our method.Comment: This is an adaptation of chapter of a PhD dissertation available at https://pure.uvt.nl/portal/files/19030880/Kollenburg_Computer_13_11_2017.pd

    Deciding on the starting number of classes of a latent class tree

    Get PDF
    In recent studies, latent class tree (LCT) modeling has been proposed as a convenient alternative to standard latent class (LC) analysis. Instead of using an estimation method in which all classes are formed simultaneously given the specified number of classes, in LCT analysis a hierarchical structure of mutually linked classes is obtained by sequentially splitting classes into two subclasses. The resulting tree structure gives a clear insight into how the classes are formed and how solutions with different numbers of classes are substantively linked to one another. A limitation of the current LCT modeling approach is that it allows only for binary splits, which in certain situations may be too restrictive. Especially at the root node of the tree, where an initial set of classes is created based on the most dominant associations present in the data, it may make sense to use a model with more than two classes. In this article, we propose a modification of the LCT approach that allows for a nonbinary split at the root node, and we provide methods to determine the appropriate number of classes in this first split, based either on theoretical grounds or on a relative improvement of fit measure. This novel approach also can be seen as a hybrid of a standard LC model and a binary LCT model, in which an initial, oversimplified but interpretable model is refined using an LCT approach. Furthermore, we show how to apply an LCT model when a nonstandard LC model is required. These new approaches are illustrated using two empirical applications: one on social capital and the other on (post)materialism

    How to Define and Test an Indirect Moderation Model: The Missing Link in Regression-Based Path Models

    Get PDF
    Two of the most important extensions of the basic regression model are moderated effects (due to interactions) and mediated effects (i.e. indirect effects). Combinations of these effects may also be present. In this work, an important, yet missing combination is presented that can determine whether a moderating effect itself is mediated by another variable. This ‘indirect moderation’ model can be assessed by a four-step decision tree which guides the user through the necessary regression analyses to infer or refute indirect moderation. A simulation experiment shows how the method works under some basic scenarios

    Assessing model fit in latent class analysis when asymptotics do not hold

    Get PDF
    The application of latent class (LC) analysis involves evaluating the LC model using goodness-of-fit statistics. To assess the misfit of a specified model, say with the Pearson chi-squared statistic, a p-value can be obtained using an asymptotic reference distribution. However, asymptotic p-values are not valid when the sample size is not large and/or the analyzed contingency table is sparse. Another problem is that for various other conceivable global and local fit measures, asymptotic distributions are not readily available. An alternative way to obtain the p-value for the statistic of interest is by constructing its empirical reference distribution using resampling techniques such as the parametric bootstrap or the posterior predictive check (PPC). In the current paper, we show how to apply the parametric bootstrap and two versions of the PPC to obtain empirical p-values for a number of commonly used global and local fit statistics within the context of LC analysis. The main difference between the PPC using test statistics and the parametric bootstrap is that the former takes into account parameter uncertainty. The PPC using discrepancies has the advantage that it is computationally much less intensive than the other two resampling methods. In a Monte Carlo study we evaluated Type I error rates and power of these resampling methods when used for global and local goodness-of-fit testing in LC analysis. Results show that both the bootstrap and the PPC using test statistics are generally good alternatives to asymptotic p-values and can also be used when (asymptotic) distributions are not known. Nominal Type I error rates were not met when sample size was small and the contingency table has many cells. Overall the PPC using test statistics was somewhat more conservative than the parametric bootstrap. We have also replicated previous research suggesting that the Pearson χ2 statistic should in many cases be preferred over the likelihood-ratio G2 statistic. Power to reject a model for which the number of LCs was one less than in the population was very high, unless sample size was small. When the contingency tables are very sparse, the total bivariate residual (TBVR) statistic, which is based on bivariate relationships, still had very high power, signifying its usefulness in assessing model fit. Keywords: goodness-of-fit, posterior predictive check, parametric bootstrap, latent class analysi

    Predictive discarding of wafers based on power leakage predictions from single layer misalignment data

    Get PDF
    Photolithography is a process used in the manufacturing of dies, which are at the core of complex integrated circuits. During this process several layers of semi-conducting material are stacked on top of each other. Precise alignment of the layers is crucial to the performance of a die. Upon completion, each die is subjected to several electrical tests. If many dies of a wafer fail the test, the whole wafer is considered faulty and has to be discarded, or reworked). This paper proposes the use of machine learning models to predict the outcome of a crucial test for MOS power leakage, from misalignments of a single layer. Wafers which are predicted to be faulty when finished can be predictively discarded, saving costs and resources otherwise spent on finishing the faulty wafer

    Hour-by-hour physical activity patterns of adults aged 45-65 years: a cross-sectional study

    No full text
    Background: Limited information exists on hour-by-hour physical activity (PA) patterns among adults aged 45-65 years. Therefore, this study aimed to distinguish typical hour-by-hour PA patterns, and examined which individuals typically adopt certain PA patterns. Methods: Accelerometers measured light and moderate-vigorous PA. GIS-data provided proportions of land use within an 800 and 1600 m buffer around participant's homes. Latent class analyses were performed to distinguish PA patterns and groups of individuals with similar PA patterns. Results: Four PA patterns were identified: a morning light PA pattern, a mid-day moderate-vigorous PA pattern, an overall inactive pattern and an overall active pattern. Groups of individuals with similar PA patterns differed in ethnicity, dog ownership, and the proportion of roads, sports terrain, larger green and blue space within their residential areas. Conclusions: Four typical hour-by-hour PA patterns, and three groups of individuals with similar patterns were distinguished. It is this combination that can substantially contribute to the development of more tailored policies and interventions. PA patterns were only to a limited extent associated with personal and residential characteristics, suggesting that other factors such as work time regimes, family life and leisure may also have considerable impact on the distribution of PA throughout the day
    corecore